UFCFEL-15-3 Security Data Analytics and Visualisation

Portfolio Assignment: Part 3

Academic year: 2023-24

Conduct a security investigation into a suspected insider threat


UWEtech are calling you back once more to help them with their security challenges. They believe that one of their employees has been the cause of their recent security problems, and they believe they may have an insider threat within the company. They enlist your help to examine employee log activity, to see what behaviours deviate from the norm and to identify which user may be acting as a threat to their organisation.

Dataset: You will be issued a unique dataset based on your UWE student ID. Failure to use the dataset that corresponds to your student ID will result in zero marks. Please access the datasets via Blackboard.

This exercise carries a weight of 45% towards your overall portfolio submission

Submission Documents


For Part 3 of your portfolio, your complete output file should be saved as:

  • STUDENT_ID-PART3.ipynb

This should then be included in a ZIP file along with your other two portfolio documents.

The deadline for your portfolio submission is THURSDAY 11th JANUARY @ 14:00.

DATASET: Load in the data

Please provide the string below that you have been assigned as given in the spreadsheet available on Blackboard. The directory containing your dataset should be at the same location as your notebook file.

In [1]:

Function for loading data - do not change

In [2]:
/var/folders/61/yb7st_9d6c9frlz4phr4vcyh0000gn/T/ipykernel_61839/3678823198.py:11: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  email_data = pd.read_csv('./' + DATASET + '/email_data.csv', parse_dates=True, index_col=0)
/var/folders/61/yb7st_9d6c9frlz4phr4vcyh0000gn/T/ipykernel_61839/3678823198.py:12: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  file_data = pd.read_csv('./' + DATASET + '/file_data.csv', parse_dates=True, index_col=0)
/var/folders/61/yb7st_9d6c9frlz4phr4vcyh0000gn/T/ipykernel_61839/3678823198.py:13: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  web_data = pd.read_csv('./' + DATASET + '/web_data.csv', parse_dates=True, index_col=0)
/var/folders/61/yb7st_9d6c9frlz4phr4vcyh0000gn/T/ipykernel_61839/3678823198.py:14: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  login_data = pd.read_csv('./' + DATASET + '/login_data.csv', parse_dates=True, index_col=0)
/var/folders/61/yb7st_9d6c9frlz4phr4vcyh0000gn/T/ipykernel_61839/3678823198.py:15: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  usb_data = pd.read_csv('./' + DATASET + '/usb_data.csv', parse_dates=True, index_col=0)

The following code samples may be useful to aid your investigation

In [3]:
Out[3]:
user role email pc
0 usr-lqi Technical usr-lqi@uwetech.com pc0
1 usr-kga Security usr-kga@uwetech.com pc1
2 usr-wkx Director usr-wkx@uwetech.com pc2
3 usr-sfo Finance usr-sfo@uwetech.com pc3
4 usr-cgh Security usr-cgh@uwetech.com pc4
... ... ... ... ...
245 usr-nxs HR usr-nxs@uwetech.com pc245
246 usr-rri HR usr-rri@uwetech.com pc246
247 usr-agk Finance usr-agk@uwetech.com pc247
248 usr-pcs HR usr-pcs@uwetech.com pc248
249 usr-xfr HR usr-xfr@uwetech.com pc249

250 rows × 4 columns

In [4]:
Out[4]:
datetime user action pc
0 2022-01-01 00:02:43 usr-zrp login pc15
1 2022-01-01 00:05:17 usr-evy login pc92
2 2022-01-01 00:15:12 usr-ubr login pc119
3 2022-01-01 00:18:24 usr-pnn login pc169
4 2022-01-01 00:30:04 usr-mbh login pc178
... ... ... ... ...
151995 2022-10-31 23:48:21 usr-kmz logoff pc130
151996 2022-10-31 23:52:36 usr-sxl logoff pc201
151997 2022-10-31 23:56:04 usr-zog logoff pc206
151998 2022-10-31 23:59:09 usr-ubr logoff pc119
151999 2022-10-31 23:59:33 usr-gcn logoff pc118

152000 rows × 4 columns

In [5]:
Out[5]:
datetime user action pc
174 2022-01-01 08:12:08 usr-sfo login pc3
387 2022-01-01 17:59:48 usr-sfo logoff pc3
718 2022-01-02 08:47:23 usr-sfo login pc3
870 2022-01-02 17:06:22 usr-sfo logoff pc3
1176 2022-01-03 08:21:10 usr-sfo login pc3
... ... ... ... ...
150891 2022-10-29 17:54:10 usr-sfo logoff pc3
151201 2022-10-30 08:32:20 usr-sfo login pc3
151399 2022-10-30 18:27:43 usr-sfo logoff pc3
151638 2022-10-31 07:15:11 usr-sfo login pc3
151900 2022-10-31 18:26:56 usr-sfo logoff pc3

608 rows × 4 columns

In [6]:
Out[6]:
datetime user action pc
174 2022-01-01 08:12:08 usr-sfo login pc3
387 2022-01-01 17:59:48 usr-sfo logoff pc3
718 2022-01-02 08:47:23 usr-sfo login pc3
870 2022-01-02 17:06:22 usr-sfo logoff pc3
1176 2022-01-03 08:21:10 usr-sfo login pc3
... ... ... ... ...
150891 2022-10-29 17:54:10 usr-sfo logoff pc3
151201 2022-10-30 08:32:20 usr-sfo login pc3
151399 2022-10-30 18:27:43 usr-sfo logoff pc3
151638 2022-10-31 07:15:11 usr-sfo login pc3
151900 2022-10-31 18:26:56 usr-sfo logoff pc3

608 rows × 4 columns

In [7]:
Out[7]:
array(['Technical', 'Security', 'Director', 'Finance', 'Services',
       'Legal', 'HR'], dtype=object)
In [8]:
In [9]:
Out[9]:
['usr-sfo',
 'usr-mbw',
 'usr-jdl',
 'usr-zko',
 'usr-yzg',
 'usr-nap',
 'usr-hiz',
 'usr-jor',
 'usr-xez',
 'usr-iii',
 'usr-puc',
 'usr-ybd',
 'usr-hvn',
 'usr-ybp',
 'usr-pou',
 'usr-gyd',
 'usr-txr',
 'usr-vgj',
 'usr-lmi',
 'usr-rsx',
 'usr-rzw',
 'usr-kjp',
 'usr-xze',
 'usr-otp',
 'usr-ebm',
 'usr-aqt',
 'usr-xad',
 'usr-viw',
 'usr-zzh',
 'usr-lyz',
 'usr-agk']

Question 1: For all Finance staff members during the month of January, show the distribution of when users logon and logoff by hour using one or more Bar Charts, and report the most common login and logoff time for this role.

Hint: Once you have filtered the data to only Finance staff in January, count the number of logons and logoffs that occur in each hour of the day.

(1 mark)

In [10]:

Question 2: Plot a multi-line chart that shows the logon and logoff times during the month of January for the user of pc42.

Hint: Filter the data as you need, and make two calls to plt.plot().

(1 mark)

In [14]:

Hint: Filter the data and then refer back to Question 4 from Part 1 to format the data correctly

(1 mark)

In [15]:

(Advanced) Question 4: Extend the above, now showing a node for every possible user. The edge connections should be as above, for emails sent by Security staff on 5th January 2022. You should use a shell layout for your network plot.

Hint: Think about how to include all users as nodes. You may even include a dummy node and remove this in your processing depending on how you form your edgelist - https://networkx.org/documentation/stable/index.html

(3 marks)

In [16]:

Question 5: Show a comparison between the files accessed by HR staff, Services staff, and Security staff, during January. You will need to think of a suitable way to convey this information within a single plot so that comparison of activity can be made easily.

Hint: Think which plot enables you to make comparisons between two attributes, and then think what the attributes would need to be for mapping three job roles against the possible set of files accessed.

(4 marks)

In [17]:

Question 6: Carry on your own investigation to find the anomalous activity across all data files provided. Provide clear evidence and justification for your investigative steps.

Marks are awarded for:

  • a clear explanation of the steps you take to complete your investigation (5)
  • suitable use of data analysis with clear explanation (6)
  • suitable use of visualisation methods with clear annotation (6)
  • identifying all of the suspicious events (8)

(25 marks)

beginning of investigation

Our goal in conducting this investigation is to find unusual activity in a variety of data files, such as USB actions, website visits, login/logout behaviours, and email correspondence.

In [18]:
login data
datetime    0
user        0
action      0
pc          0
dtype: int64
email data
datetime     0
sender       0
recipient    0
dtype: int64
employee data
user     0
role     0
email    0
pc       0
dtype: int64
usb data
datetime    0
user        0
action      0
pc          0
dtype: int64
file data
datetime    0
user        0
filename    0
dtype: int64
web data
datetime    0
user        0
website     0
dtype: int64
In [19]:

Investigate Login_data

Explanation: A global analysis was carried out in order to comprehend general login and logoff patterns, independent of individual users. The goal of this step was to find any anomalous spikes or patterns in the login and logoff behaviour throughout the whole dataset.

Evidence: To see the distribution of login and logoff actions over time, a bar chart plot was made. Unusual consistent activity outside of regular business hours were regarded as potential anomalies, as were any other unexpected patterns.

In [20]:

Justification: People who had more than two login attempts were given special attention. This can be a sign of unusual login behaviour or possible security issues.

Evidence: To show the number of login actions for each user, a table was created. Users who had made more than two login attempts were noted and marked for additional examination.

In [20]:
Out[20]:
array(['usr-yqu'], dtype=object)

here we can see that the suspected user 'usr-yqu' is resposible of the actions recorded out work hours. we will dive more to understand what this user was doing during that time and if any other user is responsible of this unusual activity.

Investigate employee_data

now that we found a name, let's look for usr-yqu and see any suspicious details

In [15]:
Out[15]:
user role email pc
166 usr-yqu Security usr-yqu@uwetech.com pc166
202 usr-yqu Services usr-yqu@uwetech.com pc202

user-yqu have 2 different roles (Security, Services) assigned to 2 different Pcs (166,202) as mentioned by the lecturer, this user will be removed from all the dataframes to continue the investigation

In [21]:
In [22]:

Investigate file_data

Explanation: In order to determine which files were accessed by a single user per role, file access patterns were examined. The goal of this step was to find any unusual or unauthorised file accesses.

Evidence: The number of files accessed by a single person per role was shown using bar charts. Files that did not follow the expected shared access pattern within a role were used to identify anomalies.

In [23]:
In [24]:
In [25]:
Out[25]:
datetime user role filename
0 2022-05-05 20:00:32.912540 usr-ezr Director /docs/clients

I got one user 'usr-ezr' is director who is the only one from his departement that accessed this file '/docs/clients', and out office hours '20pm'.

In [26]:
Out[26]:
user role email pc
35 usr-ezr Director usr-ezr@uwetech.com pc35

investigate usb_data

Investigation of USB Actions:

Evidence: To see the dates and times of USB insertions and removals, scatter plots were made. Unusual behaviours were viewed as possible anomalies, particularly when they occurred outside of business hours or on the same day as multiple actions.

In [27]:

here we can see that usr-ezr uses 2 different PCs, the pc249 is used for usb and the other one pc35 is assigned to him in the employee data file, which looks really suspicious.

In [28]:
Out[28]:
user role email pc
249 usr-xfr HR usr-xfr@uwetech.com pc249
In [29]:
Out[29]:
datetime user action pc
394729 2022-05-12 19:32:56.753903 usr-ezr usb_insert pc249
394762 2022-05-12 19:51:08.339557 usr-ezr usb_remove pc249
394851 2022-05-12 20:56:39.215300 usr-ezr usb_insert pc249
394864 2022-05-12 21:05:35.081857 usr-ezr usb_remove pc249
406217 2022-05-16 17:06:04.017364 usr-ezr usb_insert pc249
406327 2022-05-16 17:51:52.061167 usr-ezr usb_remove pc249
406504 2022-05-16 19:07:47.812389 usr-ezr usb_insert pc249
406513 2022-05-16 19:12:19.605206 usr-ezr usb_remove pc249
415821 2022-05-20 07:49:04.043470 usr-ezr usb_insert pc249
415892 2022-05-20 08:13:16.510946 usr-ezr usb_remove pc249
418175 2022-05-20 20:38:06.136269 usr-ezr usb_insert pc249
418225 2022-05-20 21:23:13.319032 usr-ezr usb_remove pc249

Explanation: In order to verify the validity of the user's actions and spot any odd patterns, the login and logoff activities for the user "usr-xfr" were carefully examined.

Evidence: The login and logoff actions over time were visualised using bar charts and scatter plots. Making sure that the login and logout patterns matched the scheduled work hours and did not display anomalous behaviour was the main goal.

In [30]:

though usr-ezr inserted and removed usb, however he never login/off to pc249. how did the pc recognize him ? a dataset mistake again ? or is part of investigation ? is the usb assigned uniquely too ? really suspicious ?!

Evidence: When we looked at the login and logout processes on "pc249.," we saw that "usr-xfr" acted consistently and proportionately. In contrast to USB actions, 'usr-ezr' did not contain any login or off records for this PC.

Rationale: It's interesting that 'usr-ezr' on 'pc249' didn't log in or out while performing USB actions. It begs the question of how the computer interprets 'usr-ezr' for USB actions in the absence of matching login information.

In [31]:

here we can see login/off hours of usr-xfr which looks normal, we will use this data to compare it to usb action

In [32]:

this scatter plot displays the usb action of usr-ezr which shows that the usb was inserted/removed when the usr-xfr was not working. meaning that usr-ezr had unauthorised action upon pc249.

Evidence: When we examined the USB actions, we saw that on PC 'pc249,' 'usr-ezr' was acting strangely. Multiple USB device insertions and removals by the user suggest possible data transfer or unauthorised access.

Justification: Because this behaviour happened outside of regular business hours, it raises suspicions about potential unauthorised activity. Data security concerns were raised when it was discovered that 'pc249' was assigned to 'usr-xfr,' an HR staff member.

Investigate web_data

Evidence: The distribution of website accesses was visualised using bar charts and network graphs. Unusual website visits were viewed as possible anomalies, particularly if they differed from departmental standards.

In [33]:
In [34]:
In [35]:
In [36]:

here we can see all the websites accessed by the users, the websites with really low occurences and some website links don't work such as www.broadcaster.com / www.kalilinux.com / www.helpineedasecurity.net / or are not safe, probably hacking websites. but any of those websites are not/never accessed by our suspect usr-ezr or any of the PCs used.

Evidence: Upon analysing website access, it was discovered that users consistently accessed the same websites. There was no departure from usual patterns, and 'usr-ezr' did not visit any dubious websites.

Justification: 'usr-ezr' behaved normally even though some users visited websites that might be dangerous. This implies that differences in USB and login behaviour might not be directly linked to online activity.

Investigate email_data

Justification: Email correspondence between "usr-ezr" and "usr-xfr" was analysed to verify the correspondence's authenticity and spot any odd email behaviour.

In [37]:

nothing abnormal, firstly, I used nx graph to check if all emails sent from usr-ezr have the same domain and nothing suspicious was found.

In [38]:
Out[38]:
datetime sender recipient
1828585 2022-05-15 22:02:41.479545 usr-ezr@uwetech.com usr-xfr@uwetech.com
1910287 2022-05-22 01:42:05.665565 usr-ezr@uwetech.com usr-xfr@uwetech.com
1942871 2022-05-24 11:01:32.289373 usr-ezr@uwetech.com usr-xfr@uwetech.com

Finally I have checked if usr-ezr has sent email to usr-xfr and he did, but nothing unusual as the dates don't match, we can dive into the email sent 15th/05/2022 at 22h as the usb action 2nd attempt happened on 16th/05/2022 at 17h and 19h.

Evidence: Examining emails exchanged between 'usr-ezr' and 'usr-xfr' revealed correspondence on May 15, 22, and 24, which somehow may have corresponded with actions taken by USB on May 16 and 20.

Investigation Summary: Anomalous Activity Detection

A variety of data files were examined in this thorough investigation in order to find any potentially unusual activity pertaining to several facets of employee behaviour. User actions pertaining to file access, USB use, web access, email correspondence, and login/logout events were the main focus. Finding any patterns or behaviours that differed from regular, everyday operations was the goal of the investigation.

Analysis of File Access:

Methodology: To identify odd access patterns, user-file interactions were closely examined. Result: The user "usr-ezr" was located and was the only one with file access in the "Director" role. Subsequent analysis showed that 'usr-ezr' may have raised issues when it accessed a sensitive file at '/docs/clients' outside of regular business hours.

Examining USB Activity:

Methodology: 'usr-ezr' was checked for during USB insertion and removal events. Result: It was discovered that 'usr-ezr' uses two distinct PCs, 'pc35' and 'pc249'. The device 'pc249,' assigned to 'usr-xfr,' suggested that it might be misused by various roles.

Examining Login and Logoff:

Methodology: The investigation of login and logoff events for "pc249" revealed a pattern that was consistent for "usr-xfr" but unusually low activity for "usr-ezr." Result: 'usr-ezr' behaved strangely after performing USB operations on 'pc249' without initiating any login activity.

Web Access Analysis:

Methodology: 'usr-ezr' was used to access and analyse websites. Result: 'usr-ezr' did not access any dubious websites. The inquiry verified regular access to frequently visited websites.

Email Activity Check:

Methodology: Emails sent by 'usr-ezr' were examined. Outcome: No irregularities found in email communications. Emails sent to 'usr-xfr' were not related to the USB actions.

In conclusion, the study revealed possible anomalous activity associated with 'usr-ezr.' Unauthorised access to private files after hours, USB device use on a PC belonging to another employee, and irregular login behaviours are some of the main discoveries. To confirm these anomalies and guarantee a complete comprehension of 'usr-ezr's' operations and possible security threats, more investigation is advised.

-> Global login/logoff patterns were subjected to additional analysis, which identified users who had more than two login actions and warranted further investigation.

Question 7: Describe what you believe are the key findings of your investigation. You should clearly state the suspect identified, and the sequential order of suspicious events, including the date and time that these occurred. You should then provide your own critical reflection of what has occurred in this scenario, giving justification for any assumptions made. Limit your response to a maximum of 400 words.

Please make clear which dataset you have used for your investigation.

(10 marks)

  • Dataset Used: Several datasets, including file_data, employee_data, usb_data, login_data, web_data, and email_data, were analysed as part of the investigation.

  • Suspect Found: 'usr-ezr' turned out to be the main culprit.

  • Order of Suspicious Events in Sequence:

  • file_data: The file '/docs/clients' was accessed by 'usr-ezr' at 20:00:32 on Date: 2022-05-05, Time: 20:00:32, suggesting possible unauthorised access.

  • Dates: 2022-05-12, 2022-05-16, 2022-05-20 USB Activity (usb_data): 'usr-ezr' demonstrated USB insert and remove actions on 'pc249,' assigned to 'usr-xfr,' raising suspicions regarding device misuse.

  • The dates of the Login/Logoff Activity (login_data) are 2022-05-12, 2022-05-16, and 2022-05-20. There are irregular patterns in the login/logoff of 'usr-ezr' on 'pc249,' indicating possible security threats.

  • Web Access (web_data): Regular use of frequently visited websites is indicated by the lack of suspicious web access for the domain "usr-ezr."

  • Email Communication (email_data): There are no anomalies in the emails that 'usr-ezr' sent to 'usr-xfr,' indicating that there is no direct connection between USB actions and email activity.

Critical Thoughts:

The inquiry revealed a series of events that suggested 'usr-ezr' was involved in possibly unlawful and suspicious activities. The access to '/docs/clients,' a sensitive file, outside of regular business hours, along with USB actions on a computer belonging to another employee, suggests a lack of concern for security procedures. There are concerns regarding how 'usr-ezr' obtained access to the PC without the required authorization given the absence of corresponding login activities during USB actions on 'pc249'.

Although the investigation shows clear evidence of unusual behaviour, critical analysis shows that further forensic analysis is necessary. To determine the scope of 'usr-ezr's' activities, additional actions include a thorough review of network logs, user authentication records, and possible coordination with IT security. Furthermore, speaking with 'usr-xfr' could reveal any cooperation or improper use of access between the two workers.

To sum up, this inquiry is the first step towards figuring out possible security threats in the company. To prevent such incidents in the future, preventive measures should be put in place, departmental collaboration should be emphasised, and a more thorough analysis should be the focus of the following actions.

Type Markdown and LaTeX: α2